-
Notifications
You must be signed in to change notification settings - Fork 159
Update VerifyDevicePath logic to run udevadm --trigger to fix scenarios where disk is not found or is wrong device #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidz627 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Wrote the e2e test and confirmed its failing before the |
@@ -49,8 +49,8 @@ func GCEClientAndDriverSetup(instance *remote.InstanceInfo) (*remote.TestContext | |||
endpoint := fmt.Sprintf("tcp://localhost:%s", port) | |||
|
|||
workspace := remote.NewWorkspaceDir("gce-pd-e2e-") | |||
driverRunCmd := fmt.Sprintf("sh -c '/usr/bin/nohup %s/gce-pd-csi-driver --endpoint=%s> %s/prog.out 2> %s/prog.err < /dev/null &'", | |||
workspace, endpoint, workspace, workspace) | |||
driverRunCmd := fmt.Sprintf("sh -c '/usr/bin/nohup %s/gce-pd-csi-driver -v=4 --endpoint=%s 2> %s/prog.out < /dev/null > /dev/null &'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small fix to improve debugging experience. This is on purpose
@@ -28,10 +28,6 @@ import ( | |||
mountmanager "sigs.k8s.io/gcp-compute-persistent-disk-csi-driver/pkg/mount-manager" | |||
) | |||
|
|||
func init() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we had two init functions before, I don't know if that was causing issues but it doesn't hurt to fix
4bd3d05
to
a2c0480
Compare
// candidate devicePaths or an empty string if none is found. If | ||
// /lib/udev/scsi_id exists it will attempt to fix any issues caused by missing | ||
// paths or mismatched devices by running a udevadm --trigger. | ||
func (m *deviceUtils) VerifyDevicePath(devicePaths []string, diskName string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need devicePaths as an argument, or can we generate it inside this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we don't really have to, but then we would need to add partition as an argument. (I can follow up but probably out of scope for this PR)
7ad145d
to
ea9fc37
Compare
} | ||
|
||
// ValidateLogicalLinkIsDisk takes a symlink location at "link" and finds the | ||
// link location - it then finds the backing PD using scsi_id and validates that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for now, although when NVMe comes up, this test will get outdated. I think writing data and verifying it would be more portable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is writing data would work if the symlink was made incorrectly, so I'm not sure it validates what we care about. (Which is that the volume matches the device). Writing this check was much harder than writing data :)
I don't think network attached PD supports NVME. Only available for local ssd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there's no NVME for PD now, I'm just saying for future proofing.
…os where disk is not found or is wrong device
/lgtm |
TODO:
/dev/by-id/{diskName}
to the wrong device or deletes the/dev/by-id/{diskName}
entirely then make sureNodeStage
is able to self-heal that usingudevadm --trigger
/kind bug
/assign @msau42
Fixes #376
/lib/udev/scsi_id
was pointing to the binary on the host due to the/lib/udev
hostPath. This was causing weird package dynamically linked library finding issues (couldn't find the right version of dynamic libs). The solution is to use the/lib/udev/scsi_id
from the container by copying it into a special directory. Thanks @msau42 for pointing out that/lib/udev
was coming from the host